Substucture Discovery in the SUBDUE System
نویسندگان
چکیده
Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the SUBDUE system, which uses the minimum description length (MDL) principle to discover substructures that compress the database and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. Inclusion of background knowledgeguides SUBDUE toward appropriate substructures for a particular domain or discovery goal, and the use of an inexact graph match allows a controlled amount of deviations in the instance of a substructure concept. We describe the application of SUBDUE to a variety of domains. We also discuss approaches to combining SUBDUE with non-structural discovery systems.
منابع مشابه
Graph-Based Hierarchical Conceptual Clustering
Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides one such combination of approaches. This work presents SUBDUE and the develop...
متن کاملSubstructure Discovery in the SUBDUE System
Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the Subdue system, which uses the minimum description length (MDL) principle to discover sub-structures that compress the database and represent structural co...
متن کاملStructural Knowledge Discovery Used to Analyze Earthquake Activity
The Subdue structural discovery system is being used as the Data Mining tool to study the "Orizaba Fault" located in Mexico, as part of a research project of the geologist Dr. Burke Burkart. We analyze the information of the Earthquake Database to discover if the earthquake activity in the area is related to the fault. We experimented with different samples of data mainly using two heuristics t...
متن کاملSubstructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previo...
متن کاملTitle: Graph-based Hierarchical Conceptual Clustering Graph-based Hierarchical Conceptual Clustering
Hierarchical conceptual clustering has been proven to be a useful data mining technique. Graphbased representation of structural information has been shown to be successful in knowledge discovery. The Subdue substructure discovery system provides the advantages of both approaches. In this paper we present Subdue and focus on its clustering capabilities. We use two examples to illustrate the val...
متن کامل